Goto

Collaborating Authors

 covariance representation



Temporal-attentive Covariance Pooling Networks for Video Recognition

Neural Information Processing Systems

For video recognition task, a global representation summarizing the whole contents of the video snippets plays an important role for the final performance. However, existing video architectures usually generate it by using a simple, global average pooling (GAP) method, which has limited ability to capture complex dynamics of videos. For image recognition task, there exist evidences showing that covariance pooling has stronger representation ability than GAP. Unfortunately, such plain covariance pooling used in image recognition is an orderless representative, which cannot model spatio-temporal structure inherent in videos. Therefore, this paper proposes a Temporal-attentive Covariance Pooling (TCP), inserted at the end of deep architectures, to produce powerful video representations.




Temporal-attentive Covariance Pooling Networks for Video Recognition

Neural Information Processing Systems

For video recognition task, a global representation summarizing the whole contents of the video snippets plays an important role for the final performance. However, existing video architectures usually generate it by using a simple, global average pooling (GAP) method, which has limited ability to capture complex dynamics of videos. For image recognition task, there exist evidences showing that covariance pooling has stronger representation ability than GAP. Unfortunately, such plain covariance pooling used in image recognition is an orderless representative, which cannot model spatio-temporal structure inherent in videos. Therefore, this paper proposes a Temporal-attentive Covariance Pooling (TCP), inserted at the end of deep architectures, to produce powerful video representations.


Martian time-series unraveled: A multi-scale nested approach with factorial variational autoencoders

Siahkoohi, Ali, Morel, Rudy, Balestriero, Randall, Allys, Erwan, Sainton, Grégory, Kawamura, Taichi, de Hoop, Maarten V.

arXiv.org Artificial Intelligence

Unsupervised source separation involves unraveling an unknown set of source signals recorded through a mixing operator, with limited prior knowledge about the sources, and only access to a dataset of signal mixtures. This problem is inherently ill-posed and is further challenged by the variety of time-scales exhibited by sources in time series data from planetary space missions. As such, a systematic multiscale unsupervised approach is needed to identify and separate sources at different time-scales. Existing methods typically rely on a preselected window size that determines their operating time-scale, limiting their capacity to handle multi-scale sources. To address this issue, instead of directly operating in the time domain, we propose an unsupervised multi-scale clustering and source separation framework by leveraging wavelet scattering covariances that provide a low-dimensional representation of stochastic processes, capable of effectively distinguishing between different non-Gaussian stochastic processes. Nested within this representation space, we develop a factorial Gaussian-mixture variational autoencoder that is trained to (1) probabilistically cluster sources at different time-scales and (2) independently sample scattering covariance representations associated with each cluster. As the final stage, using samples from each cluster as prior information, we formulate source separation as an optimization problem in the wavelet scattering covariance representation space, resulting in separated sources in the time domain. When applied to seismic data recorded during the NASA InSight mission on Mars, containing sources varying greatly in time-scale, our multi-scale nested approach proves to be a powerful tool for discriminating between such different sources, e.g., minute-long transient one-sided pulses (known as "glitches") and structured ambient noises resulting from atmospheric activities that typically last for tens of minutes. These results provide an opportunity to conduct further investigations into the isolated sources related to atmospheric-surface interactions, thermal relaxations, and other complex phenomena.


Unearthing InSights into Mars: Unsupervised Source Separation with Limited Data

Siahkoohi, Ali, Morel, Rudy, de Hoop, Maarten V., Allys, Erwan, Sainton, Grégory, Kawamura, Taichi

arXiv.org Artificial Intelligence

Source separation involves the ill-posed problem of retrieving a set of source signals that have been observed through a mixing operator. Solving this problem requires prior knowledge, which is commonly incorporated by imposing regularity conditions on the source signals, or implicitly learned through supervised or unsupervised methods from existing data. While data-driven methods have shown great promise in source separation, they often require large amounts of data, which rarely exists in planetary space missions. To address this challenge, we propose an unsupervised source separation scheme for domains with limited data access that involves solving an optimization problem in the wavelet scattering covariance representation space$\unicode{x2014}$an interpretable, low-dimensional representation of stationary processes. We present a real-data example in which we remove transient, thermally-induced microtilts$\unicode{x2014}$known as glitches$\unicode{x2014}$from data recorded by a seismometer during NASA's InSight mission on Mars. Thanks to the wavelet scattering covariances' ability to capture non-Gaussian properties of stochastic processes, we are able to separate glitches using only a few glitch-free data snippets.


Subspace Clustering for Action Recognition with Covariance Representations and Temporal Pruning

Paoletti, Giancarlo, Cavazza, Jacopo, Beyan, Cigdem, Del Bue, Alessio

arXiv.org Artificial Intelligence

Despite the fact that subspace clustering has become a powerful Given a trimmed sequence, in which a single action or activity technique for problems such as face clustering or digit is assumed to be present, the final goal of HAR is to correctly recognition, its applicability to the problems like skeletonbased classifying it. Although significant progresses have been made HAR was only explored by a limited number of works in the last years, accurate action recognition in videos is still a [7], [8], [9]. This is due to many operative limitations including challenging task because of the complexity of the visual data how to handle the temporal dimensions, the inherent noise e.g., due to varying camera viewpoints, occlusions and abrupt present in the skeletal data and the related computational changes in lighting conditions.


Deep CNNs Meet Global Covariance Pooling: Better Representation and Generalization

Wang, Qilong, Xie, Jiangtao, Zuo, Wangmeng, Zhang, Lei, Li, Peihua

arXiv.org Artificial Intelligence

Compared with global average pooling in existing deep convolutional neural networks (CNNs), global covariance pooling can capture richer statistics of deep features, having potential for improving representation and generalization abilities of deep CNNs. However, integration of global covariance pooling into deep CNNs brings two challenges: (1) robust covariance estimation given deep features of high dimension and small sample; (2) appropriate use of geometry of covariances. To address these challenges, we propose a global Matrix Power Normalized COVariance (MPN-COV) Pooling. Our MPN-COV conforms to a robust covariance estimator, very suitable for scenario of high dimension and small sample. It can also be regarded as power-Euclidean metric between covariances, effectively exploiting their geometry. Furthermore, a global Gaussian embedding method is proposed to incorporate first-order statistics into MPN-COV. For fast training of MPN-COV networks, we propose an iterative matrix square root normalization, avoiding GPU unfriendly eigen-decomposition inherent in MPN-COV. Additionally, progressive 1x1 and group convolutions are introduced to compact covariance representations. The MPN-COV and its variants are highly modular, readily plugged into existing deep CNNs. Extensive experiments are conducted on large-scale object classification, scene categorization, fine-grained visual recognition and texture classification, showing our methods are superior to the counterparts and achieve state-of-the-art performance.